从具有高隐私要求的领域(例如医疗干预空间)获得的真实数据较低,并且收购在法律上很复杂。因此,这项工作提供了一种以医疗服装为例为医疗环境创建合成数据集的方法。目的是缩小合成数据和真实数据之间的现实差距。为此,使用虚幻的引擎插件或Unity比较了3D扫描服装和设计服装的方法。此外,还使用了绿屏和目标域数据集的混合现实数据集。我们的实验表明,设计服装的结构性域随机化以及混合现实数据提供了基线,可在临床目标域的测试数据集上实现72.0%的地图。当使用15%可用的目标域列车数据时,针对100%(660张图像)目标域列车数据的差距几乎可以关闭80.05%的地图(81.95%地图)。最后,我们表明,当使用100%目标域训练数据时,精度可以提高到83.35%的地图。
translated by 谷歌翻译
We study the ability of foundation models to learn representations for classification that are transferable to new, unseen classes. Recent results in the literature show that representations learned by a single classifier over many classes are competitive on few-shot learning problems with representations learned by special-purpose algorithms designed for such problems. We offer an explanation for this phenomenon based on the concept of class-features variability collapse, which refers to the training dynamics of deep classification networks where the feature embeddings of samples belonging to the same class tend to concentrate around their class means. More specifically, we examine the few-shot error of the learned feature map, which is the classification error of the nearest class-center classifier using centers learned from a small number of random samples from each class. Assuming that the classes appearing in the data are selected independently from a distribution, we show that the few-shot error generalizes from the training data to unseen test data, and we provide an upper bound on the expected few-shot error for new classes (selected from the same distribution) using the average few-shot error for the source classes. Additionally, we show that the few-shot error on the training data can be upper bounded using the degree of class-features variability collapse. This suggests that foundation models can provide feature maps that are transferable to new downstream tasks even with limited data available.
translated by 谷歌翻译
The mechanism of existing style transfer algorithms is by minimizing a hybrid loss function to push the generated image toward high similarities in both content and style. However, this type of approach cannot guarantee visual fidelity, i.e., the generated artworks should be indistinguishable from real ones. In this paper, we devise a new style transfer framework called QuantArt for high visual-fidelity stylization. QuantArt pushes the latent representation of the generated artwork toward the centroids of the real artwork distribution with vector quantization. By fusing the quantized and continuous latent representations, QuantArt allows flexible control over the generated artworks in terms of content preservation, style similarity, and visual fidelity. Experiments on various style transfer settings show that our QuantArt framework achieves significantly higher visual fidelity compared with the existing style transfer methods.
translated by 谷歌翻译
One of the main challenges in deep learning-based underwater image enhancement is the limited availability of high-quality training data. Underwater images are difficult to capture and are often of poor quality due to the distortion and loss of colour and contrast in water. This makes it difficult to train supervised deep learning models on large and diverse datasets, which can limit the model's performance. In this paper, we explore an alternative approach to supervised underwater image enhancement. Specifically, we propose a novel unsupervised underwater image enhancement framework that employs a conditional variational autoencoder (cVAE) to train a deep learning model with probabilistic adaptive instance normalization (PAdaIN) and statistically guided multi-colour space stretch that produces realistic underwater images. The resulting framework is composed of a U-Net as a feature extractor and a PAdaIN to encode the uncertainty, which we call UDnet. To improve the visual quality of the images generated by UDnet, we use a statistically guided multi-colour space stretch module that ensures visual consistency with the input image and provides an alternative to training using a ground truth image. The proposed model does not need manual human annotation and can learn with a limited amount of data and achieves state-of-the-art results on underwater images. We evaluated our proposed framework on eight publicly-available datasets. The results show that our proposed framework yields competitive performance compared to other state-of-the-art approaches in quantitative as well as qualitative metrics. Code available at https://github.com/alzayats/UDnet .
translated by 谷歌翻译
Recent deep learning methods have achieved promising results in image shadow removal. However, their restored images still suffer from unsatisfactory boundary artifacts, due to the lack of degradation prior embedding and the deficiency in modeling capacity. Our work addresses these issues by proposing a unified diffusion framework that integrates both the image and degradation priors for highly effective shadow removal. In detail, we first propose a shadow degradation model, which inspires us to build a novel unrolling diffusion model, dubbed ShandowDiffusion. It remarkably improves the model's capacity in shadow removal via progressively refining the desired output with both degradation prior and diffusive generative prior, which by nature can serve as a new strong baseline for image restoration. Furthermore, ShadowDiffusion progressively refines the estimated shadow mask as an auxiliary task of the diffusion generator, which leads to more accurate and robust shadow-free image generation. We conduct extensive experiments on three popular public datasets, including ISTD, ISTD+, and SRD, to validate our method's effectiveness. Compared to the state-of-the-art methods, our model achieves a significant improvement in terms of PSNR, increasing from 31.69dB to 34.73dB over SRD dataset.
translated by 谷歌翻译
3D reconstruction and novel view synthesis of dynamic scenes from collections of single views recently gained increased attention. Existing work shows impressive results for synthetic setups and forward-facing real-world data, but is severely limited in the training speed and angular range for generating novel views. This paper addresses these limitations and proposes a new method for full 360{\deg} novel view synthesis of non-rigidly deforming scenes. At the core of our method are: 1) An efficient deformation module that decouples the processing of spatial and temporal information for acceleration at training and inference time; and 2) A static module representing the canonical scene as a fast hash-encoded neural radiance field. We evaluate the proposed approach on the established synthetic D-NeRF benchmark, that enables efficient reconstruction from a single monocular view per time-frame randomly sampled from a full hemisphere. We refer to this form of inputs as monocularized data. To prove its practicality for real-world scenarios, we recorded twelve challenging sequences with human actors by sampling single frames from a synchronized multi-view rig. In both cases, our method is trained significantly faster than previous methods (minutes instead of days) while achieving higher visual accuracy for generated novel views. Our source code and data is available at our project page https://graphics.tu-bs.de/publications/kappel2022fast.
translated by 谷歌翻译
Semi-supervised anomaly detection is a common problem, as often the datasets containing anomalies are partially labeled. We propose a canonical framework: Semi-supervised Pseudo-labeler Anomaly Detection with Ensembling (SPADE) that isn't limited by the assumption that labeled and unlabeled data come from the same distribution. Indeed, the assumption is often violated in many applications - for example, the labeled data may contain only anomalies unlike unlabeled data, or unlabeled data may contain different types of anomalies, or labeled data may contain only 'easy-to-label' samples. SPADE utilizes an ensemble of one class classifiers as the pseudo-labeler to improve the robustness of pseudo-labeling with distribution mismatch. Partial matching is proposed to automatically select the critical hyper-parameters for pseudo-labeling without validation data, which is crucial with limited labeled data. SPADE shows state-of-the-art semi-supervised anomaly detection performance across a wide range of scenarios with distribution mismatch in both tabular and image domains. In some common real-world settings such as model facing new types of unlabeled anomalies, SPADE outperforms the state-of-the-art alternatives by 5% AUC in average.
translated by 谷歌翻译
For an autonomous vehicle, the ability to sense its surroundings and to build an overall representation of the environment by fusing different sensor data streams is fundamental. To this end, the poses of all sensors need to be accurately determined. Traditional calibration methods are based on: 1) using targets specifically designed for calibration purposes in controlled environments, 2) optimizing a quality metric of the point clouds collected while traversing an unknown but static environment, or 3) optimizing the match among per-sensor incremental motion observations along a motion path fulfilling special requirements. In real scenarios, however, the online applicability of these methods can be limited, as they are typically highly dynamic, contain degenerate paths, and require fast computations. In this paper, we propose an approach that tackles some of these challenges by formulating the calibration problem as a joint but structured optimization problem of all sensor calibrations that takes as input a summary of the point cloud information consisting of ground points and pole detections. We demonstrate the efficiency and quality of the results of the proposed approach in a set of experiments with LiDAR simulation and real data from an urban trip.
translated by 谷歌翻译
This work proposes Multi-task Meta Learning (MTML), integrating two learning paradigms Multi-Task Learning (MTL) and meta learning, to bring together the best of both worlds. In particular, it focuses simultaneous learning of multiple tasks, an element of MTL and promptly adapting to new tasks with fewer data, a quality of meta learning. It is important to highlight that we focus on heterogeneous tasks, which are of distinct kind, in contrast to typically considered homogeneous tasks (e.g., if all tasks are classification or if all tasks are regression tasks). The fundamental idea is to train a multi-task model, such that when an unseen task is introduced, it can learn in fewer steps whilst offering a performance at least as good as conventional single task learning on the new task or inclusion within the MTL. By conducting various experiments, we demonstrate this paradigm on two datasets and four tasks: NYU-v2 and the taskonomy dataset for which we perform semantic segmentation, depth estimation, surface normal estimation, and edge detection. MTML achieves state-of-the-art results for most of the tasks. Although semantic segmentation suffers quantitatively, our MTML method learns to identify segmentation classes absent in the pseudo labelled ground truth of the taskonomy dataset.
translated by 谷歌翻译
这项工作提出了专门针对粒子探测器的低潜伏期图神经网络(GNN)设计的新型可重构体系结构。加速粒子探测器的GNN是具有挑战性的,因为它需要次微秒延迟才能在CERN大型强子撞机实验的级别1触发器中部署网络以进行在线事件选择。本文提出了一种自定义代码转换,并在基于互动网络的GNN中使用完全连接的图表中的矩阵乘法操作降低了强度,从而避免了昂贵的乘法。它利用了稀疏模式以及二进制邻接矩阵,并避免了不规则的内存访问,从而降低了延迟和硬件效率的提高。此外,我们引入了一种基于外部产品的基质乘法方法,该方法通过降低潜伏期设计的强度降低来增强。此外,引入了融合步骤,以进一步降低设计延迟。此外,提出了GNN特异性算法 - 硬件共同设计方法,该方法不仅找到了具有更好延迟的设计,而且在给定的延迟约束下发现了高精度的设计。最后,已经设计和开源了此低延迟GNN硬件体系结构的可自定义模板,该模板可以使用高级合成工具来生成低延迟的FPGA设计,并有效地利用资源。评估结果表明,我们的FPGA实施速度高24倍,并且消耗的功率比GPU实施少45倍。与我们以前的FPGA实施相比,这项工作的延迟降低了6.51至16.7倍。此外,我们的FPGA设计的延迟足以使GNN在亚微秒,实时撞机触发器系统中部署,从而使其能够从提高的精度中受益。
translated by 谷歌翻译